home *** CD-ROM | disk | FTP | other *** search
- TUNING THE PORTABLE FORTH ENVIRONMENT -*- indented-text -*-
- #####################################
-
- 1) Loop unrolling in the inner interpreter
- ==========================================
-
- The most time critical piece of code in pfe is the inner interpreter,
- a tight loop calling all primitives compiled into a high-level
- definition. You find it in file support.c, function run_forth().
-
- On some CPU's it significantly saves time when the code of the inner
- interpreter is unrolled several times without the need to jump back to
- the start of the loop after every primitive is executed. On other
- CPU's it doesn't help or even makes it slightly slower.
-
- For example the benchmark-performance of pfe on a 486 is about 15%
- better with unrolled NEXT, while the performance on a Pentium becomes
- worse.
-
- You'll have to try it, what is better on your machine. To enable the
- feature, add the following compiler option in Makefile:
-
- -DUNROLL_NEXT
-
-
- 2) Using global register variables
- ==================================
-
- pfe is designed for best portability. This means it can be compiled
- with a variety of compilers on many systems. Obviously this prevented
- me from squeezing the last bit of performance out of any special
- system.
-
- Fortunately there's a way to tune it up significantly with only little
- effort provided you have GNU-C at hand.
-
- Let me explain: As most of you probably know, a Forth-interpreter
- traditionally contains a so-called virtual machine. pfe does. This
- virtual machine consists of several virtual registers and a basic set
- of operations. The virtual registers are:
-
- ip an instruction pointer
- sp the data stack pointer
- rp the return stack pointer
- w an auxiliary register
-
- in pfe there are additionally:
-
- lp pointer to local variables
- fp floating point stack pointer
-
- In a traditional assembler-based Forth implementation these virtual
- registers would be mapped to physical registers of the CPU at hand.
- How efficient such an implementation is depends heavily on how
- cleverly this mapping is done.
-
- pfe has no other choice than to declare C-language global variables to
- represent these virtual registers. These variables are accessed *very*
- frequently.
-
- Now GNU-C allows us to put global variables in registers! Obviously
- the number of registers in a CPU is limited and the use of registers
- by library functions and the compiler itself interferes.
-
- In spite of these restrictions it is possible to find a niche even in
- an i386 where to place the two most important virtual registers
- resulting in a performance boost of about 50%. (Just one more detail
- that shows what a great job the GNU-C developers did.)
-
-
- If your system is one of those known by the config-script then all
- provisions to use global register variables are already taken.
- You can enable and disable the usage of global register variables in
- `src/makefile' by specifying the command line option '-DUSE_REGS'
- (default) or removing it.
-
- If your system isn't known by the config script, then first make sure
- you have a stable port according to the instructions in the file
- `INSTALL'. Then read the next section to enable the usage of register
- variables on your system. If all works well please send me your
- changes.
-
-
- Warning:
-
- current versions of gcc (<= 2.6.0) seem to compile incorrect code in
- very special situations when global register variables are used. This
- is reported and fixed in later gcc versions.
-
- When you find something not working that worked in previous versions,
- then please check if it works again after recompiling pfe without
- -DUSE_REGS. Please inform me of such cases:
- duz@roxi.rz.fht-mannheim.de <Dirk Zoller>
-
-
- Choosing registers to use
- =========================
-
- When you use global register variables in GNU-C then you have to
- explicitly state which machine register to use for the global variable
- to declare "register". The syntax is like this:
-
- register type variable_name asm ("machine register name");
-
- instead of just
-
- type variable_name;
-
- As far as I see choosing machine registers to use for global register
- variables is just a matter of trial and error.
-
- First find out how registers are named on your machine. Not how the
- manufacturer names them but how gas, the GNU-assembler, names them.
- It's easy: simply use gcc to compile one of the C files with option
- -S. I changed the `makefile' to allow this by simply `make core.s'.
-
- Then look at `core.s': You don't have to know much of assembly
- language programming and even less of the particular CPU. All you are
- interested in is: what are the registers? In `core.s' search for the
- label `dupe_' i.e. the compiled function that does the work of the
- Forth word `DUP'. The C-source for dupe is:
-
- Code (dupe)
- {
- --sp;
- sp[0] = sp[1];
- }
-
-
- On an RS/6000 (where you won't have to do this because I did it
- already) using gcc you'd find the following assembler lines generated
- for dupe_:
-
- .dupe_:
- l 11,LC..106(2)
- l 9,0(11)
- cal 0,-4(9)
- st 0,0(11)
- l 0,0(9)
- st 0,-4(9)
- br
-
- Reading more of the generated assembler source allowed a guess that
- - Gcc talks to the assembler about registers by their numbers only.
- - Gcc never uses registers with numbers around 16 while the cpu seems
- to have 32 such registers.
-
- Next edit the file `src/virtual.h'. Add a system specific section of
- preprocessor definitions naming CPU registers to use for virtual
- machine registers like this:
-
- ...
- #elif AIX3
-
- # define REGIP "13"
- # define REGSP "14"
- # define REGRP "15"
- # define REGW "16"
- # define REGLP "17"
- # define REGFP "18"
-
- #elif...
-
- Ok, the full set needed a little more experimentation. Maybe start
- with only REGSP or REGIP.
-
- After enabeling these declarations with the -DUSE_REGS command line
- option another `make core.s' yields the following translation for DUP:
-
- .dupe_:
- cal 14,-4(14)
- l 0,4(14)
- st 0,0(14)
- br
-
- Quite a difference!
-
- If your CPU has different types of registers for data and for pointers
- then the pointers are needed in pfe. (On M68k the Ax not the Dx.)
-
- If you don't have enough free registers in your CPU then serve the
- first virtual registers in the above list first. They are ordered by
- their importance.
-
- Then do a `make new' with option -DUSE_REGS. If you get compiler
- errors and warnings about `spilled' or `clobbered' registers then
- change the mapping until it compiles quietly. There's a good chance
- that it still runs now and if it does it runs significantly faster
- than before.
-
- Good luck!
-
- Dirk
-